Outlier Detection from a Mixture Distribution When Training Data Are Unlabeled

نویسندگان

  • Stephan R. Sain
  • H. L. Gray
چکیده

We consider the difficult task of using seismic signals (or any other discriminants) for detecting nuclear explosions from the large number of background signals such as earthquakes and mining blasts. Given a ground-truth database (i.e., labeled data), Fisk et aL (1996) consider the problem of detecting outliers (nuclear explosions) from a single background-signal population, and their approach has been applied successfully in several regions around the world. Wang et al. (1997) attack the problem in terms of modeling the background as a mixture distribution and looking for outliers (nuclear events) from that mixture. However, those authors only considered the case in which at least some fraction of the training sample was labeled, that is, at least some ground-truth information was available, and the number of distinct classes of events was known. In the current article, we extend these results to the case in which no events in the training sample are labeled and also to the case in which the number of event types represented in the training sample is unknown. One can view the mixture approach as a robust method for fitting a density to training data that may not be normally distributed whether or not the data consist of identifiable components that have a physical interpretation. The technique is demonstrated using simulated data as well as two sets of seismic data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Suspicious Card Transactions in unlabeled data of bank Using Outlier Detection Techniqes

With the advancement of technology, the use of ATM and credit cards are increased. Cyber fraud and theft are the kinds of threat which result in using these Technologies. It is therefore inevitable to use fraud detection algorithms to prevent fraudulent use of bank cards. Credit card fraud can be thought of as a form of identity theft that consists of an unauthorized access to another person's ...

متن کامل

A New Approximation for the Null Distribution of the Likelihood Ratio Test Statistics for k Outliers in a Normal Sample

Usually when performing a statistical test or estimation procedure, we assume the data are all observations of i.i.d. random variables, often from a normal distribution. Sometimes, however, we notice in a sample one or more observations that stand out from the crowd. These observation(s) are commonly called outlier(s). Outlier tests are more formal procedures which have been developed for detec...

متن کامل

A statistical test for outlier identification in data envelopment analysis

In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the prese...

متن کامل

Application of Recursive Least Squares to Efficient Blunder Detection in Linear Models

In many geodetic applications a large number of observations are being measured to estimate the unknown parameters. The unbiasedness property of the estimated parameters is only ensured if there is no bias (e.g. systematic effect) or falsifying observations, which are also known as outliers. One of the most important steps towards obtaining a coherent analysis for the parameter estimation is th...

متن کامل

Statistical Techniques in Anomaly Intrusion Detection System

In this paper, we analyze an anomaly based intrusion detection system (IDS) for outlier detection in hardware profile using statistical techniques: Chi-square distribution, Gaussian mixture distribution and Principal component analysis. Anomaly detection based methods can detect new intrusions but they suffer from false alarms. Host based Intrusion Detection Systems (HIDSs) use anomaly detectio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005